better accuracy
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Novel Contributions: Our main contributions are: (a) the development of a (non-trivial) data-dependent
We thank the reviewers for their valuable time and thoughtful feedback. Our method also has a provably log-time prediction algorithm, enabling almost real-time predictions. We next use label partitioning to improve over NMF-GT for larger datasets (Table 2). We do mention that for Mediamill and RCV1x there were no clear label partitions. We thank the reviewers for these suggestions.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada (0.04)
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
Huang, Hai, Balestriero, Randall
Low-Rank Adaptation (LoRA) is the bread and butter of Large Language Model (LLM) finetuning. LoRA learns an additive low-rank perturbation, $AB$, of a pretrained matrix parameter $W$ to align the model to a new task or dataset with $W+AB$. We identify three core limitations to LoRA for finetuning--a setting that employs limited amount of data and training steps. First, LoRA employs Dropout to prevent overfitting. We prove that Dropout is only suitable for long training episodes but fails to converge to a reliable regularizer for short training episodes. Second, LoRA's initialization of $B$ at $0$ creates a slow training dynamic between $A$ and $B$. That dynamic is also exacerbated by Dropout that further slows the escape from $0$ for $B$ which is particularly harmful for short training episodes. Third, the scaling factor multiplying each LoRA additive perturbation creates ``short-sighted'' interactions between the LoRA modules of different layers. Motivated by principled analysis of those limitations, we find an elegant solution: a Dropout-free, scaling-free, LoRA with Adaptive Learning rate--coined ALLoRA. By scaling the per sample and per parameter gradients with a coefficient inversely proportional to parameters' $\ell_2$ norm, ALLoRA alleviates those three limitations. As a by-product, ALLoRA removes two hyper-parameters from LoRA: the scaling factor and the dropout rate. Empirical results show that ALLoRA admits better accuracy than LoRA on various settings, including against recent LoRA variants such as Weight-Decomposed Low-Rank Adaptation (DoRA). Ablation studies show our solution is the optimal in a family of weight-dependent / output-dependent approaches on various LLMs including the latest Llama3.
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Jordan (0.04)
Reviews: Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Update after author response: Thank you for the response. Additional details that the curves between the local optima are not unique would be also interesting to see. Summary: This paper first shows a very interesting finding on the loss surfaces of deep neural nets, and then presents a new ensembling method called Fast Geometric Ensembling (FGE). Given two already well trained deep neural nets (with no limitations on their architectures, apparently), we have two sets of weight vectors w1 and w2 (in a very high-dimensional space). This paper states a (surprising) fact that for given two weights w1 and w2, we can (always?) Figure 1 demonstrates this, and Left is the training accuracy plot on the 2D subspace passing independent weights w1, w2, w3 of ResNet-164 (from different random starts); whereas Middle and Right are the 2D subspace passing independent weights w1, w2 and one bend point w3 on the curve (Middle: Bezier, Right: Polygonal chain).
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Large language models (LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization (LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Noroozi, Vahid, Majumdar, Somshubra, Kumar, Ankur, Balam, Jagadeesh, Ginsburg, Boris
In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively during inference. The proposed model is thoughtfully designed in a way to eliminate the accuracy disparity between the train and inference time which is common for many streaming models. Furthermore, our proposed encoder works with various decoder configurations including Connectionist Temporal Classification (CTC) and RNN-Transducer (RNNT) decoders. Additionally, we introduced a hybrid CTC/RNNT architecture which utilizes a shared encoder with both a CTC and RNNT decoder to boost the accuracy and save computation. We evaluate the proposed model on LibriSpeech dataset and a multi-domain large scale dataset and demonstrate that it can achieve better accuracy with lower latency and inference time compared to a conventional buffered streaming model baseline. We also showed that training a model with multiple latencies can achieve better accuracy than single latency models while it enables us to support multiple latencies with a single model. Our experiments also showed the hybrid architecture would not only speedup the convergence of the CTC decoder but also improves the accuracy of streaming models compared to single decoder models.
Best of Deep Neural Networks applications in 2022 part1
Abstract: Training a very deep neural network is a challenging task, as the deeper a neural network is, the more non-linear it is. We compare the performances of various preconditioned Langevin algorithms with their non-Langevin counterparts for the training of neural networks of increasing depth. For shallow neural networks, Langevin algorithms do not lead to any improvement, however the deeper the network is and the greater are the gains provided by Langevin algorithms. Adding noise to the gradient descent allows to escape from local traps, which are more frequent for very deep neural networks. Following this heuristic we introduce a new Langevin algorithm called Layer Langevin, which consists in adding Langevin noise only to the weights associated to the deepest layers.
Machine learning interview preparation -- intro
So your data already has the right classification associated with it. Common use of supervised learning -- to predict values for new data. With supervised learning you have to rebuild models each time you get new labeled data, to make sure that the predictions are accurate. Having a dataset dedicated to it you can teach your model to predict what fruit it sees on a picture. Unsupervised learning -- is when you train a model with unlabeled data.
Flood Prediction Using Machine Learning Models
Syeed, Miah Mohammad Asif, Farzana, Maisha, Namir, Ishadie, Ishrar, Ipshita, Nushra, Meherin Hossain, Rahman, Tanvir
Floods are one of nature's most catastrophic calamities which cause irreversible and immense damage to human life, agriculture, infrastructure and socio-economic system. Several studies on flood catastrophe management and flood forecasting systems have been conducted. The accurate prediction of the onset and progression of floods in real time is challenging. To estimate water levels and velocities across a large area, it is necessary to combine data with computationally demanding flood propagation models. This paper aims to reduce the extreme risks of this natural disaster and also contributes to policy suggestions by providing a prediction for floods using different machine learning models. This research will use Binary Logistic Regression, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) and Decision tree Classifier to provide an accurate prediction. With the outcome, a comparative analysis will be conducted to understand which model delivers a better accuracy.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.06)
- Oceania > Australia (0.05)
- North America > United States (0.04)
- (8 more...)
- Research Report > New Finding (0.80)
- Research Report > Experimental Study (0.80)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)